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1. Overview 

openNG is an open-source, non-commercial revisioned research and intelligence application. Its 
purpose is to act as an organizational tool for large sets of interconnected data, and as a 
backend/framework for retrieving and parsing third-party data as requested by an end user. It is meant 
to "level the playing field" by making such software freely available to anybody who would have a 
need for it, without restrictions. The model that openNG uses is similar to that of a node graph. 

The following are some envisioned use cases for the application: 

• A number of journalistic organizations investigates the funding for a political candidate. They 
need to tie together a number of corporations and individuals to find out where the money is 
coming from, and why it was contributed to the campaign of that candidate. 

• Activists investigate a corporation that has blocked funding to a whistleblower organization. 
Their goal is to find those responsible for the final decision on the blockade, and to determine 
their motivations for that decision. 

The above is not a conclusive list; there will almost certainly be many other use cases for openNG. 

The following items are currently out of scope, but may be added later as manpower allows: 

• Automated natural language processing (NLP) to determine the meaning of texts. Currently it is 
left up to the end user to extract meaning from data, when this data is not provided in a regular 
format. While some basic recognition (for eg. dates and geographical locations) may be added, 
full-blown natural language processing is considered too complex and time-consuming to 
implement at this point, with little added benefit. 

While a basic implementation would be relatively simple, this would result in many false 
positives and misinterpretations. Implementations that decrease the quality of the dataset are 
considered unacceptable. 

• Automatic creation of relationships between entities; while openNG will suggest other items 
that may be of interest (because, for example, they share a lot of properties or relationships or 
can otherwise be connected), it won't create relationships by itself. Similar data quality concerns 
apply as for natural language processing. 

• A mobile interface (as part of the primary frontend). openNG is a data- intensive application, 
and as mobile platforms are not generally suited for these kind of applications, developing a 
mobile interface is not a priority. 

The following items are explicitly out of scope, and are extremely unlikely to ever be considered: 

• Full integration with third-party software suites such as Maltego or Palantir. 

• Native mobile applications. There is no reason to not simply implement a mobile interface as 
part of the primary frontend. Similar reasoning applies as above. 

The intention is for openNG to provide research capabilities that are on par with or exceeding those of 



commercially available software such as Palantir, internal software employed by intelligence agencies 
such as GCHQ and the NSA, and other software suites that are currently only accessible to a select 
group of individuals or organizations. 

The primary focus of openNG is on the following points, in decreasing order of importance: 

1. Usability through intuitive UX. Usage of the application should be obvious to a layman, without 
requiring application-specific documentation, other than minimal inline documentation within 
the application. No in-person training should be required whatsoever - it should be possible to 
simply download and install the application, and begin using it straight away. 

2. Accessibility. This is primarily accomplished by developing the frontend of openNG to be 
completely browser-based, built on open standards, and with the ability to run it in any modern 
standards-compliant browser. 

3. Full revisioning. Old revisions of data items and deleted items are always retained, including 
metadata like its author and revision date. Features such as an audit log and fine-grained access 
control using tags ensure that no data is lost. 

4. Saving time through automated parsing of known data formats. This includes, for example, 
being able to understand WHOIS data, the structure of online news articles, and so on. 
Emphasis is on correctness over completeness; while false positives are always a possibility 
when parsing third-party formats, the software should explicitly fail when it encounters an 
unclear data format, rather than attempting to "parse what it can". 

5. Lack of a pre-defined set of possible data elements. While many other research software suites 
are tailored towards processing specific types of data - eg. identities of individuals, network 
hosts, etc. - openNG is designed to be data-agnostic. It will treat any entity as an item with 
key/value pairs of properties, and unidirectional or bidirectional relationships. All entity, 
relationship, and property types are defined ad-hoc, and can be given additional meaning at a 
later point when this turns out to be necessary. This makes it possible to use openNG for 
processing any kind of data that can be represented as an entity with properties and 
relationships. 

6. Cross-instance collaboration, where an 'instance' is defined as an installation of openNG on a 
system. What this means in practice, is that individuals or organizations can 'federate' between 
their individual instances of openNG to exchange data on-the-fly. Rather than using an 
import/export system, queries are run against federated hosts in real time, and UUIDs are used 
across the board. While this introduces an availability requirement for remote instances, it 
removes the need to regularly synchronize data, to resolve merging conflicts, and to map IDs 
between different systems. 

2. Technology stack 

The target platform for openNG is as follows: 

• Backend: Any modern Linux system, or any system with a similar architecture that supports 
the backend technologies used. 

• Frontend: Any modern standards-compliant browser, on any operating system and any 
reasonably modern system. openNG should function correctly on modern systems that do not 
have many resources available, and are under eg. RAM or CPU constraints. 

The technology stack is as follows: 



Backend: CoffeeScript, using Node.js 

Rationale: CoffeeScript is an expressive and concise language that provides a good amount of 
syntactic sugar for working with data and application logic in general. It compiles to Javascript, 
and can thus run without an additional VM. 

Node.js is an efficient language for working with concurrent streams of data due to its 
asynchronous nature, and has a well-documented core API and healthy third-party module 
ecosystem. Interoperability between different third-party modules is generally good; a number 
of high-level mechanics (streams, promises, ...) are (informally) standardized and used by most 
third-party modules. This means that no time will be wasted on finding compatible modules. 

Frontend: HTML, CSS 

Rationale: HTML and CSS are standardized formats for developing browser-based content. 
openNG will use HTML5/CSS3 features where appropriate, and all functionality will be 
natively supported by any modern standards-compliant browser. No external plugins (Java, 
Adobe Flash, . . .) will be used. 

Frontend: CoffeeScript + Browserify 

Rationale: The rationale for using CoffeeScript is similar to that of the backend. Browserify is 
used to provide a Node.js-compatible API and bundle up code; this will make it easier to work 
on the frontend and backend at the same time, as similar APIs are exposed. 

Frontend: AppCore (working title), PureCSS, AngularJS 

Rationale: AppCore is an independently developed UI toolkit for browser-based applications, 
based on PureCSS and AngularJS. It provides reusable UI components with an unopinionated 
style - similar to what PureCSS offers - but with more options and functionality. Some of these 
features include autocompletion, "auto-duplication" of form fields as necessary, and "dockable" 
layout elements. 

Database: PostgreSQL 

Rationale: PostgreSQL is a performant and advanced open-source RDBMS. Unlike MySQL, it 
is not subject to project ownership issues. PostgreSQL has a native UUID data type, as well as a 
native JSON data type (since 9.3/9.4), that allows for schemaless storage and querying of 
auxiliary data. Other features such as partial indexes are useful for cleanly working with 
revisions of data. 

Cache: Redis 

Rationale: Redis is a cross-platform high-performance data store with a number of well-defined 
collection types, and a relatively simple API. Time complexity of functions is clearly defined 
and adhered to. Redis will be used as a backing store for short-term cacheing of data, as well as 
for fuzzy searches and fuzzy autocompletion. 

Communication: ZeroMQ, msgpack 

Rationale: ZeroMQ is a high-performance and light-weight cross-platform message queue, with 
support for a number of important message patterns such as request/reply and publish/subscribe. 
It will be used for both federation with other openNG instances, and communication with 
external utilities and plug-ins (such as infoserver providers; more on this later). 

msgpack is a high-performance cross-platform binary serialization library that supports JSON- 
like data structures. 



The use of ZeroMQ and msgpack rather than any language-specific serialization or 
communication methods, means that third-party utilities for openNG do not necessarily need to 
be using the same technology stack, while still providing high performance and throughput with 
low overhead (unlike, for example, HTTP). 

• Communication: HTTP/JSON API 

Rationale: A HTTP API is exposed for (a subset of) the functionality of openNG. This API is 
what is used by the frontend UI - however, it will be well-documented and have some degree of 
backwards compatibility, so that it can also be used from other applications. JSON is used as 
serialization format. 

The technologies used for processing searches in particular are subject to change, in case another 
solution turns out to work better. 

3. Underlying concepts 

Internally, the model that openNG uses is similar to that of a node graph - this is also reflected in the 
name of the project. The primary components of a dataset are nodes (or entities), edges (relationships), 
and properties (or attributes). For consistency, and to accurately illustrate the envisioned use cases for 
openNG, this document will use the following terms: nodes, relationships, and properties. 

Unlike many other node-graph-based applications, openNG does not work with distinctly different 
datasets, but maintains a single dataset that contains all nodes, optionally tagged by project. This allows 
one to more easily detect relationships between nodes that at first appear to belong to entirely different 
projects, but actually have an as-of-yet undiscovered relationship between them. 

3.1 Nodes 

Nodes, or 'entities', are best described as "things". A node could be a person, a company, an 
organization, a contract, a news article, an event, and so on. If something can have relationships, then 
it's probably a node. 

A less-than-obvious example would be a contract; while you could say that Alice is an employee of 
Bob (thus, an 'employee of relationship exists between Alice and Bob), you could also argue that the 
employee contract itself is a node, Alice has an 'employee' relationship with that contract, and Bob has 
an 'employer' relationship with that contract. 

Effectively, when in doubt whether something should be represented as a node or a relationship, you 
should probably represent it as a node. As nodes can have properties but relationships cannot, this will 
allow you to specify additional data regarding the contract in our example, such as the starting date of 
employment. 

Abstract concepts such as 'events' can also be represented as nodes; you may have a node representing 
a company dinner, with all of the employees attending listed as having an 'attendee of relationship. 

3.2 Relationships 

Relationships, or 'edges' in more traditional node graph terminology, are connections between nodes. 
These can be unidirectional (from node A to B) or bidirectional (from both node A to B, and from B to 
A). An example of a unidirectional relationship would be 'subsidiary of between two companies, while 
an example of a bidirectional relationship would be 'living together with' for two people. 



There is no limit on how many relationships a node or a pair of nodes may have. Relationship types are 
defined ad-hoc; that is, a description of the relationship is entered on creation of a relationship, and this 
description is used to create a new relationship type if a type with that name does not exist yet. These 
types can be given additional meaning at a later point in time. 

Relationships have some additional optional metadata attributes: 

• Source: One or more sources that support the existence of the relationship. The concept of a 
'source' is discussed later on in this document. 

• Reliability: How reliable this information (and its source data) is: 'false', 'likely false', 
'uncertain', 'likely true', 'confirmed'. This will default to 'uncertain'. The application may treat 
relationships differently (eg. when visualizing nodes) depending on their reliability. 

3.3 Properties 

Properties, also 'attributes', are bits of information associated with a node. Properties are represented 
as key/value pairs; a single key may hold multiple values, with separate metadata for each value. 

Property types, like relationship types, are defined ad-hoc; the 'key' of a property is what will determine 
the name of the property type. Property types, however, can be given a more specific meaning at a later 
point - for example, a "date of birth" property when occurring for a "Person" node would be marked as 
representing a date. This kind of information is not immediately necessary when creating nodes or 
properties, but can be used for for example temporal or geospatial visualization (discussed later). 

An example of a property would be the "date of birth" for a person, the "incorporation date" for a 
corporation, or the geographical location of a company meeting or protest. 

Like relationships, properties have some additional optional metadata attributes. These metadata 
attributes may be different for different values under the same key. This way, conflicting data can be 
represented accurately. 

• Source: One or more sources that support the value for this property. The concept of a 'source' 
is discussed later on in this document. 

• Reliability: How reliable this information (and its source data) is: 'false', 'likely false', 
'uncertain', 'likely true', 'confirmed'. This will default to 'uncertain'. The application will display 
values in decreasing order of reliability, and may otherwise treat properties differently (eg. 
when visualizing nodes) depending on their reliability. 



